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DETAILED ACTION 



Response to Arguments 



1 . Applicant's arguments filed 07/06/2003 have been fully considered but they are 
not persuasive. 

Regarding claim 1, Applicant states in the amendment, on page 13 last 
paragraph, lines 1-6 and page 14 first paragraph, lines 7-9, Kanevsky '091 does not 
disclose generating word form components in at least one of the subsets by splitting 
word forms having frequencies less than a threshold, and generating a language 
component vocabulary VC comprising word forms and word form components, as 
essentially claimed in claim 1 . 

However, Kanevsky et al. teaches splitting word forms having frequencies less 
than a threshold, C.4. lines 58-63, and generating a language component vocabulary VC 
comprising word forms and word form components, C.4. lines 18-22 "A first component 
of an integer vector may be referred to as a "stem" and a second component as an 
"ending" and, thus, the map Tm may be considered as a "split" of words using 
"idealized" vocabulary of integers. The vocabulary of word forms and word form 
components are created, and further represented as "idealized" vocabulary of integers", 
C.2.lines 18-25 "This assignment of integer vectors to word forms can be thought of as 
the splitting of word forms into a tuple of sub-words (e.g. prefix/stems/ending), 
numerating all these sub-words and thus getting tuples of indexed integers. It is to be 
understood that a "tuple" refers to an ordered set of components. For instance, 4-tupel 
of integers may be a set: (5, 2, 10, 120), while a 6-tuple of sub-words may be a set: (us, 
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ed, in, speech, reco, gnition)", the generated language component vocabulary VC 
comprises word forms and word form components, which are simple represented as 
integers. 

Claim Rejections - 35 USC § 102 

2. The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that 
form the basis for the rejections under this section made in this Office action: 

A person shall be entitled to a patent unless - 

(e) the invention was described in (1) an application for patent, published under section 122(b), by 
another filed in the United States before the invention by the applicant for patent or (2) a patent 
granted on an application for patent by another filed in the United States before the invention by the 
applicant for patent, except that an international application filed under the treaty defined in section 
351(a) shall have the effects for purposes of this subsection of an application filed in the United States 
only if the international application designated the United States and was published under Article 21 (2) 
of such treaty in the English language. 

3. Claims 1-4, and 48-51 are rejected under 35 U.S.C. 102(e) as being anticipated 
by Kanevsky et al. (U.S. Patent No. 6,073,091 filed Aug. 6, 1997). 

As per claims 1 and 48, Kanevsky et al. discloses a method for generating a 
language component vocabulary VC for a speech recognition system having a language 
vocabulary V of a plurality of word forms, the method comprising the steps of: 

partitioning the language vocabulary V into subsets of word forms based on 
frequencies of occurrence of the respective word forms (C.3. lines 52, 53); and 

in at least one of said subsets, splitting word forms having frequencies less than 
a threshold to thereby generate word form components (C.4. lines 58-63); and 

generating a language component vocabulary VC comprising word forms and 
word form components (C.4. lines 18-22 "A first component of an integer vector may be 
referred to as a "stem" and a second component as an "ending" and, thus, the map Tm 
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may be considered as a "split" of words using "idealized" vocabulary of integers. The 
vocabulary of word forms and word form components are created, and further 
represented as "idealized" vocabulary of integers", C.2. lines 18-25 "This assignment of 
integer vectors to word forms can be thought of as the splitting of word forms into a 
tuple of sub-words (e.g. prefix/stems/ending), numerating all these sub-words and thus 
getting tuples of indexed integers. It is to be understood that a "tuple" refers to an 
ordered set of components. For instance, 4-tupel of integers may be a set: (5, 2, 10, 
120), while a 6-tuple of sub-words may be a set: (us, ed, in, speech, reco, gnition)", the 
generated language component vocabulary VC comprises word forms and word form 
components, which are simple represented as integers.). 

As per claims 2 and 49, Kanevsky et al. discloses all of the limitations of claim 1 , 
upon which claim 2 depends. Kanevsky further discloses: 

the frequencies of the word forms are estimated from a given textual corpus 
(CAIines 13, 14). 

As per claims 3 and 50, Kanevsky et al. discloses all of the limitations of claim 1 , 
upon which claim 3 depends. Kanevsky et al. further discloses: 

said portioning step includes the sub-step of numerating the plurality of word 
forms in the language vocabulary V in descending order based on the frequencies 
associated with each of the plurality of word forms (CAIines 10-14). 

As per claims 4 and 51 , Kanevsky et al. discloses all of the limitations of claim 1 , 
upon which claim 4 depends. Kanevsky et al. further discloses: 
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said partitioning step partitions the language vocabulary V into at least two 
subsets S1 and S2, and said splitting step splits the word forms of subset S2 into 2- 
tuple components including stems and endings, but does not split the word forms of 
subset S1 (CAIines 58-63). 

Claim Rejections - 35 USC § 103 

4. The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

5. Claims 5, 7-12, 52 and 54-59 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Kanevsky et al. (U.S. Patent No. 6,073,091, filed Aug. 6, 1997) in 
view of Kanevsky et al. (U.S. Patent No. 5,835,888 Nov. 10, 1998). 

Kanevsky et al. and Kanevsky et al. are analogous art in that they are both 
involve language modeling for speech recognition. 

As per claims 5 and 52, Kanevsky et al. (U.S. Patent No. 6,073,091) discloses 
all of the limitations of claim 4, upon which claim 5 depends. Kanevsky further discloses: 

a splitting step comprising 3-tuple components (C.4. lines 26-28, 30, 31) 

Kanevsky et al. does not disclose: 

further partitioning the language vocabulary V into a third subset S3, with word 
forms therein being split in said splitting step into 3-tuple components including prefixes, 
stems and endings. 
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However, as it is well known in the art, Kanevsky et al. (U.S. Patent No. 
5,835,888 Nov. 10, 1998) teaches partitioning the language vocabulary V into a subset 
that includes prefixes, stems, and endings (CAIines 18-20). Therefore, at the time of 
the invention, it would have been obvious to combine Kanevsky et al. with Kanevsky et 
al. for the purpose of increasing the component size in a vocabulary set which would 
have increased the recognition of larger words that included a prefix, stem and ending 
while decreasing the size of dictionary needed to match these words. 

As per claims 7 and 54, Kanevsky et al (U.S. Patent No. 6,073,091) discloses all 
of the limitations of claim 1, upon which claim 7 depends. Kanevsky et al. further 
discloses: 

said splitting is performed using a fixed vocabulary (C.4. lines 14, 15-N=400,000); 
Kanevsky et al. does not disclose: 

a fixed list of allowable endings, with each word from the fixed vocabulary being 
split into at least a stem and an ending that is an element of the fixed set of endings. 

However, as it is well known in the art, Kanevsky et al. (U.S. Patent No. 
5,835,888) teaches having a fixed list of endings and each word from the vocabulary 
being split into a stem and an ending that is an element of the fixed set of endings 
(C.5.lines 9-13). Therefore, at the time of the invention, it would have been obvious to 
combine Kanevsky et al. with Kanevsky et al. for the purpose of having a limit to the 
amount of vocabulary and stem and ending sets which would increase the processing 
time for a query into which word is to be recognized. 
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As per claims 8 and 55, Kanevsky et al (U.S. Patent No. 6,073,091) and 
Kanevsky et al. (U.S. Patent No. 5,835,888) disclose all of the limitations of claim 7, 
upon which claim 8 depends. Kanevsky et al. (U.S. Patent No. 6,073,091) does not 
disclose: 

the fixed set of allowable endings includes an empty ending; 

However, as it is well known in the art, Kanevsky et al. (U.S. Patent 5,835,888) 
teaches having a list of allowed endings that includes empty endings (C.3. lines 50-54). 
Therefore, at the time of the invention, it would have been obvious to combine 
Kanevsky et al. with Kanevsky et al. for the purpose of having a limit to the amount of 
vocabulary and stem and ending sets and having an empty ending for the case where 
the stem doesn't have an ending which would increase the processing time for a query 
into which word is to be recognized. 

As per claims 9 and 56, Kanevsky et al (U.S. Patent No. 6,073,091) discloses all 
of the limitations of claim 1 , upon which claim 9 depends. Kanevsky et al. does not 
disclose: 

generating and storing a word for to corresponding word form components table; 

However, as it is well known in the art, Kanevsky et al. (U.S. Patent No. 
5,835,888) teaches generating and storing a word form and its stem and endings in a 
table (C.3.lines 50, 51 ). Therefore, at the time of the invention, it would have been 
obvious to combine Kanevsky et al. with Kanevsky et al. for the purpose of efficiently 
managing the word forms to word form components for further processing. 
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As per claims 10 and 57, Kanevsky et al (U.S. Patent No. 6,073,091 ) and 
Kanevsky et al. (U.S. Patent No. 5,835,888) disclose all of the limitations of claim 9, 
upon which claim 10 depends. Kanevsky et al. (U.S. Patent No. 6,073,091) does not 
disclose: 

labeling each of the word form components stored in said table to distinguish 
between stems, prefixes and endings; 

However, as it is well known in the art, Kanevsky et al. (U.S. Patent 5,835,888) 
teaches labeling the word components in the stored table to distinguish between 
components (Fig. 3A-the prefix, root/stem, and end are labeled). Therefore, at the time 
of the invention, it would have been obvious to combine Kanevsky et al. with Kanevsky 
et al. for the purpose of not confusing the tags associated with each segment of the 
word form. 

As per claims 11 and 58, Kanevsky et al (U.S. Patent No. 6,073,091) discloses, 
all of the limitations of claim 1 , upon which claim 1 1 depends. Kanevsky et al. (U.S. 
Patent No. 6,073,091 ) further discloses: 

generating a map of said word forms to said word form components (C.4. lines 
30-36, -"...word forms are mapped into corresponding stem and ending numbers. "-the 
numbers are interpreted as the components), said map further including each of a 
plurality of no-split words as being associated with itself (C.4. lines 59, 60); 

Kanevsky et al. (U.S. Patent No. 6,073,091) does not disclose: 

filtering a textual corpus using the map to generate a textual component corpus 
containing the non-split word forms and the word form components of the map. 
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accumulating the word form components and the non-split word forms generated 
by said filtering step in an n-gram language model; and 

determining counts of n-tuple sets of word form components and word forms to 
estimate n-gram probabilities for the n-gram language. 

However, as it well known in the art, Kanevsky et al (U.S. Patent No. 5,835,888) 
teaches filtering a textual corpus using the map to generate a textual component corpus 
(C.4. lines 18-20-the sub-vocabularies is interpreted as the component corpus) and 
accumulating the word form components and the non-split word forms generated by 
said filtering step in an n-gram language model (C.5.lines 48-59) and determining 
counts of n-tuple (C.5.lines 54-56-lists the n-tuple sets) sets of word form components 
and word forms to estimate n-gram probabilities for the n-gram language (C. 5. lines 63- 
65-the counts are n-gram based from the n-tuple sets). Therefore, at the time of the 
invention it would have been obvious to combine Kanevsky et al. with Kanevsky et al. 
The motivation for doing so would have been to obtain a corpus of corresponding word 
forms to components and generate a way to find the probability of the components 
correctly matching the full word forms without consuming an enormous amount of 
memory space due to an model which only incorporated the full forms of the word, 
which would improve the word recognition without substantially increasing the need for 
data space. 

As per claims 12 and 59, Kanevsky et al (U.S. Patent No. 6,073,091 ) and 
Kanevsky et al. (U.S. Patent No. 5,835,888) disclose all of the limitations of claim 1 1 , 
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upon which claim 12 depends. Kanevsky et al. (U.S. Patent No. 6,073,091) further 
discloses: 

mapping every word in the corpus into a n-tuple word form component (C.2. lines 
19-21). 

6. Claims 6 and 53 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Kanevsky et al. (U.S. Patent No. 6,073,091) in view of Karaali et al. (U.S. Patent No. 
5,930,754 filed Jun. 13, 1997). 

As per claims 6 and 53, Kanevsky et al. (U.S. Patent No. 6,073,091) discloses 
all of the limitations of claim 1, upon which claim 6 depends. Kanevsky et al. does not 
disclose: 

splitting is performed subject to a constraint in which a word that contains a given 
string of letters is prevented from being split within the string if the string of letters 
corresponds to one phoneme. 

However, as it is well known in the art, Karaali et al. teaches of multiple letters 
corresponding to a single phone, and in the alignment, not aligning a different phone 
with the multiple letters. Therefore, at the time of the invention, it would have been 
obvious to combine Kanevsky et al. with Karaali et al. The motivation for doing so would 
have been to align corresponding letter pairs with the single phone for the purpose of 
improving the accuracy of speech recognition due to a well known method of aligning 
graphemes to phonemes. 

Allowable Subject Matter 

7. Claims 42-47, and 60 are allowed. 
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8. The following is a statement of reasons for the indication of allowable subject 
matter: 

Regarding claims 42 and 60 as understood by the Examiner, the closest prior 
art of Kanevsky et al. (U.S. Patent No. 5,835,888) reads on providing a fixed set of 
allowable endings, including an empty ending (C.3.lines 50-53, C.5. lines 9-13) and 
providing a fixed set of constraints for splitting words into stems (C.5. lines 10-13), 
randomly splitting a word to generate an ending from the fixed list of allowable endings 
(C.5.lines 9-16), defining and storing a stem set containing the stem generated at said 
splitting and a word set containing the word (C.3.lines 50-53-the table stores the stem 
and the word), determining possible splits for a word to generate stems and endings 
therefrom, using the fixed set of allowable endings and the fixed set of constraints 
(C.5.lines 9-13) 

Prior art does not teach nor fairly suggest: 

the steps and combination of (c) initializing a split map of words and the 
corresponding stems and endings by setting a variable t to a predetermined value, and 
selecting a first word from the fixed vocabulary, (f) determining whether t is less than the 
size of the vocabulary, obtaining a new word from the vocabulary V, when t is less than 
the size of the vocabulary, (h) determining possible splits for the new word to generate 
stems and endings therefrom, using the fixed set of allowable endings and the fixed set 
of constraints, (i) determining whether there is a split for the new word that generates a 
previously stored stem of the stem set, (j) splitting the current word into the previously 
stored stem and an ending of the set of allowable endings, when there is a split for the 
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new word that generates the previously stored stem of the stem set, (k) determining 
whether another previously stored stem in the stem set can be replaced by a new stem 
generated, when there is no split for the current word that generates the previously 
stored stem of the stem set, (I) redefining the stem set and the split map to include the 
new stem generated at (h) in place of the other previously stored stem, when the other 
previously stored stem can be replaced by the new stem generated at step (h), (m) 
redefining the stem set to include any new stem into which the current word may be split 
and extending the split map to include the current word by splitting the new word into 
the new stem, when the other previously stored stem in the stem set cannot be replaced 
by the new stem generated at step (h), and (n) incrementing t and returning to step (f) if 
t is less than the size of the vocabulary V. 

Claims 43-47 are allowable as they further limit their parent claims. 

9. As allowable subject matter has been indicated, applicant's reply must either 
comply with all formal requirements or specifically traverse each requirement not 
complied with. See 37 CFR 1 . 1 1 1 (b) and MPEP § 707.07(a). 

10. Any comments considered necessary by applicant must be submitted no later 
than the payment of the issue fee and, to avoid processing delays, should preferably 
accompany the issue fee. Such submissions should be clearly labeled "Comments on 
Statement of Reasons for Allowance." 
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Conclusion 



1 1 . Applicant's amendment necessitated the new ground(s) of rejection presented in 
this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP 

§ 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 
CFR 1.136(a). 

A shortened statutory period for reply to this final action is set to expire THREE 
MONTHS from the mailing date of this action. In the event a first reply is filed within 
TWO MONTHS of the mailing date of this final action and the advisory action is not 
mailed until after the end of the THREE-MONTH shortened statutory period, then the 
shortened statutory period will expire on the date the advisory action is mailed, and any 
extension fee pursuant to 37 CFR 1 .136(a) will be calculated from the mailing date of 
the advisory action. In no event, however, will the statutory period for reply expire later 
than SIX MONTHS from the date of this final action. 

1 2. Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Lamont M Spooner whose telephone number is 
703/305-8661 . The examiner can normally be reached on 8:00 AM - 5:00 PM. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Nguyen Vo can be reached on 703/308-6728. The fax phone number for 
the organization where this application or proceeding is assigned is 703-872-9306. 
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Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). 
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